Adaptive EWMA control chart using Bayesian approach under ranked set sampling schemes with application to Hard Bake process

The memory-type control charts, such as cumulative sum (CUSUM) and exponentially weighted moving average control chart, are more desirable for detecting a small or moderate shift in the production process of a location parameter. In this article, a novel Bayesian adaptive EWMA (AEWMA) control chat utilizing ranked set sampling (RSS) designs is proposed under two different loss functions, i.e., square error loss function (SELF) and linex loss function (LLF), and with informative prior distribution to monitor the mean shift of the normally distributed process. The extensive Monte Carlo simulation method is used to check the performance of the suggested Bayesian-AEWMA control chart using RSS schemes. The effectiveness of the proposed AEWMA control chart is evaluated through the average run length (ARL) and standard deviation of run length (SDRL). The results indicate that the proposed Bayesian control chart applying RSS schemes is more sensitive in detecting mean shifts than the existing Bayesian AEWAM control chart based on simple random sampling (SRS). Finally, to demonstrate the effectiveness of the proposed Bayesian-AEWMA control chart under different RSS schemes, we present a numerical example involving the hard-bake process in semiconductor fabrication. Our results show that the Bayesian-AEWMA control chart using RSS schemes outperforms the EWMA and AEWMA control charts utilizing the Bayesian approach under simple random sampling in detecting out-of-control signals.


Bayesian approach
In statistical inference, the classical and Bayesian approaches are the two estimation methods. The classical method of estimation is based on only sample information while in the Bayesian approach both types of information, i.e. prior and sample information, are utilized when estimating the unknown population parameter. The Bayesian approach is founded on the concept that probability can represent uncertainty, and permits the incorporation of prior knowledge and uncertainty into our analysis. Bayesian inference requires the use of a prior distribution, which means that both non-informative and informative priors are relevant to Bayesian analysis. On the other hand, a conjugate prior refers to a situation where the prior distribution and the sampling distribution belong to the same family of distributions. It is widely used in machine learning, finance, and medical research, where incomplete information and uncertainty are standard. In this study, we define the variable X as the study characteristic with an in-control process mean of θ and a variance of δ 2 and taking normal conjugate prior under parameters θ 0 and δ 2 0 , is mathematized as: But if there is a lack of information regarding the population parameter, the prior is considered non-informative, which has a minimal impact on the P distribution, reflecting the Bayesian approach to incorporating prior information into the analysis. In many cases, a non-informative prior is assumed to be proportional to a uniform distribution, which assigns equal probability to all possible parameter values within a specified range. The uniform prior distribution is represented by the probability function given as: where c represents a constant of proportionality.
When little or no prior information is available for an unknown population parameter, a non-informative prior is commonly used in Bayesian analysis. This type of prior distribution has a minimal effect on the posterior distribution. Jeffrey 27 proposed a prior distribution that is proportional to the Fisher information matrix in such situations, the p(θ) is mathematized as In Eq. (3), I(θ) = −E ∂ 2 ∂θ 2 log f (X/θ ) shows the Fisher information matrix and it allows for the incorporation of available information on the parameter into the analysis.
Bayesian statistics relies on integrating sample information based a prior distribution to obtain a P distribution that encompasses all relevant knowledge about the unknown population parameter. The P distribution is updated and provides more information than the prior distribution, as it incorporates both the sample and prior information to produce a probability distribution for the given parameter θ is defined as The PP distribution using P distribution as a prior distribution for novel data-set Y follows In Bayesian methodology, the LF choice is pivotal in reducing the risks associated with the Bayes estimator. In this study, we examine two distinct LFs.
Squared error loss function. Gauss 28 illustrates a symmetric loss function called SELF that can be used in statistical estimation. If we have an estimator θ for a population parameter (unknown) θ , then SELF can be mathematized as: Bayes estimator using SELF is θ = E θ/x (θ). Varian 29 conducted a study on a specific type of loss function called the Linex Loss Function. This LFs is asymmetric and is used to minimize risks connected with the Bayesian estimator. The LLF is defined as follows:

Linex loss function.
Based on LLF, θ which shows the Bayesian estimator and is given by

Ranked set sampling
The notion of RSS was initially presented by McIntyre 30 . This technique involves the following steps for selecting a sample from the population of interest: 1. Select m 2 random samples independently from the studied population and distributed into m sets with equal size. Array all m units by the personal judgment of the researcher or by using auxiliary variables and/or any numerical method without measurement. 2. After ordering all the m sets, select 1 st unit from the first set and 2 nd unit is chosen from the second set and so on. Completing the steps outlined above results in a single cycle of RSS. if required, one can repeat these steps r times, resulting in a sample size of n = rm Under the RSS scheme, the estimator for population mean with a single cycle is mathematized as 1. To apply the MRSS scheme, Choose the m 2 units from the population under study using the same method as for RSS. Next, distributed the selected elements into m similar sets and rank the elements within each set in ascending order based on the variable of interest. 2. If the set size is even, select the smallest element from the middle two elements (m/2)th element and the largest elements from the middle two (m/2)th sampling elements. In case the m is odd, select median elements from ((m + 1)/2)th arranged sets. This process completes a single cycle of MRSS. If necessary, replicate the process r times to obtain a desired sample size n = rm.
Utilizing MRSS, the population mean estimator for odd sample size under a single cycle is mathematized as and For a single cycle of MRSS, the population mean estimator for the case even sample size is given as:: with variance Extreme ranked set sampling apporoach. A modified ranked set sampling approach, known as the extreme ranked set sampling (ERSS) scheme, was proposed by Amawi et al. 32 . This method is particularly useful when collecting extreme elements is difficult. The following steps provide a detailed overview of the process for selecting an ERSS sample.
1. The m 2 units are selected from the population under study and allocated randomly in m sets of equal size, assigning ranks to each unit within a set based on a particular study variable. 2. For ERSS, the selection of units depends on the sample size along with the number of order sets. In case even sample size, then both the smallest and largest elements should be chosen from the first and last order sets corresponding to the middle half of the ranked units, i.e. (m/2)th. In contrast, if the sample size is odd, the smallest and largest elements should be selected from the first and last order sets that correspond to the outer halves of the ranked units i.e., (m − 1/2)th , and the median element should be chosen from the last order set.
If essential, the complete method of ERSS is repeated r time to get the required sample size n = mr. For odd sample size, under ERSS the mean estimator for the population mean (unknown) using ERSS using single cycle can be written as: This section describes the suggested AEWMA CC for monitoring variation and detecting the small, moderate and large shifts of a normally distributed manufacturing process using different RSS schemes. Consider the study variable X that following a normal distribution with θ and δ 2 as a mean and variance respectively. The probability function for X can be expressed as Let the δ * t be the sequence of EWMA statistic applying {X t } , given by: where ψ is a smoothing constant and δ * 0 = 0 . The estimator δ * t is unbiased for the in-control process and biased for the out-of-control process. Haq et al. 18 proposed an unbiased estimator δ for both the case of in-control and out-of-control situations, which is given by They suggested to use δ t = | δ * * t | , to estimate δ. The proposed AEWMA CC under Bayesian theory applying various ranked-based sampling designs for the process mean using the sequence θ (RSSi)LF is given by In case, if the plotting statistic cross the threshold value h then the process is said to be out-of-control otherwise the process is in control.
In case the prior and sampling distribution are both normal distributions than P distribution will also be a normal distribution. The mean and standard deviation of the P distribution are given by θ n and δ n respectively. The P(θ/x) can be demonstrated as follows:

Under SELF, the Bayes estimator using various RSS schemes is mathematized as
The properties of the θ (SELF) is given as respectively. The Bayes estimator θ RSS i (LLF ) for suggested Bayesian CC using LLF applying ranked-based sampling methods is derived as The mean and standard deviation of θ (LLF ) is given by respectively. The suggested Bayesian-AEWMA CC, which utilizes various RSS schemes for the P and PP distributions, is defined based on a set of feature observations of size h denoted as y 1 , y 2 , ...., y h is given by which is normally distributed with mean and variance θ n and δ 1 respectively, derived as . The mean and standard deviation of θ LLF is given as

Simulation study
To appraise the efficacy of the suggested AEWMA CC with different ranked-based designs applying the Bayesian approach under an informative prior distribution using the Monte Carlo simulation technique. We take two different values of smoothing constants i.e., ψ = 0.10 and ψ = 0.25 to study the effectiveness of smoothing constants on the suggested Bayesian-AEWMA CC. The complete simulation steps follow as: Estimating the threshold for an in-control ARL.
i. When calculating the mean as well as variance of both the P distribution and PP distribution under different LFs, we employed the standard normal distribution as prior and sampling distribution. i.e., E θ (RSSi)LF and δ (RSSi)LF . ii. For an unchangeable value of smoothing constant ψ choose a value of h. iii. From normal distribution, select a ranked set sample of size n for in-control process i.e., X ∼ N E θ , δ 2 . iv. Compute the suggested Bayesian-AEWMA statistic given in Eq. (21), and evaluate the process accordingly. v. If the process is determined to be in-control, continue with the steps above until an out-of-control signal is observed, and make a record of the number of consecutive in-control run lengths.
Setting the threshold for out-of-control ARL.
i. Create a random sample drawn from a normal distribution, but with a mean that has been shifted. i.e., X ∼ N(E(θ LF ) + δ σ √ n , δ). ii. Calculate W t and assess the procedure using the proposed AEWMA CC under the Bayesian approach applying various RSS designs. www.nature.com/scientificreports/ iii. If the process is deemed to be in control, the two steps mentioned earlier should be repeated until the process is declared out-of-control. It is also essential to keep track of the number of in-control runs for record-keeping purposes. iv. Execute the previously described steps repeatedly for 100,000 iterations to determine the run-length profiles. Tables 1, 2 Table 6 shows the comparison of EWMA and AEWMA CC with Bayesian approach using SRS with the proposed CC using RSS schemes under LLF, the ARL results of Bayesian-EWMA CC utilizing SRS at Table 3. Using LLF, the run length profile using P distribution for Bayesian-AEWMA CC, for ψ = 0.10, n = 5.  At the larger shift the ARL values decrease rapidly which shows the superiority of the proposed CC applying Bayesian theory with various ranked-based sampling designs, and more quickly detect the out-of-control signals than the existing Bayesian-EWMA and Bayesian-AEWMA CC. and the ARL Figs. 1, 2 and 3 also show the efficiency of the proposed Bayesian AEWMA CC under RSS schemes. The following are key discoveries of the suggested CC using distinct LFs and under various ranked-based sampling designs using P and PP distribution:      The findings displayed in Tables 1, 2, 3, 4, 5 and 6 regarding the Bayesian-AEWMA CC, which utilized different LFs and RSS schemes for both P and PP distribution, indicate that this method is highly effective in identifying out of control signals when contrasted to other sampling schemes analyzed in this study. Specifically, the Bayesian-AEWMA CC with MRSS stands out as the most efficient in identifying such signals. These results demonstrate the superiority of the suggested Bayesian-AEWMA CC method when it comes to identifying outof-control signals, and its effectiveness in ensuring that prompt corrective measures can be taken.

Real-life applications
Many analysts in the field of SPC use both actual and simulated datasets to evaluate the performance of CCs. In the current study, we have utilized real-dataset obtained from Montgomery 33 to demonstrate the functioning and carrying out of the Bayesian AEWMA CC utilizing various ranked-based sampling designs based on P and PP distributions applying two distinct LFs. The dataset contains 45 samples, with each sample containing 5 wafers. The semiconductor production process involves photolithography in conjunction with the hard bake process, and measurements are taken in microns. Samples were collected at hourly intervals, with the first thirty samples representing in controlled process (Phase I), and the next fifteen samples representing the process that is outof-control (Phase II). It is important to note that all observations in the Phase-II dataset have been adjusted by adding 0.017, indicating an upward shift in the core process mean. Figures 4 and 5 depict the EWMA and AEWMA CCs utilizing Bayesian theory applying P and PP distribution based on SRS utilizing SELF. For Fig. 4, the CC indicates that the process cannot detect out of control signals, while for Fig. 5 the process becomes out of control on 40th sample. Figures 6, 7, and 8 display the offered Bayesian CC applying SELF with different RSS methods using P and PP distribution. Figures 6, 7 and 8 indicate that the process detect out of control signals for RSS on the 37th sample, for MRSS on 34th, and for ERSS on 36 th sample. It can be illustrate from plots 1-8, that the Bayesian-AEWMA CC under RSS methods, as proposed, is more effective in pointing the out-of-control signals than both the Bayesian-EWMA and Bayesian-AEWMA CC using SRS.

Conclusion
This study proposes a novel Bayesian AEWMA CC that applies different ranked-based sampling designs under informative prior and two different LFs using P and PP distributions for the process mean. The outcomes presented in Tables 1, 2, 3, 4, 5 and 6 demonstrate the performance of the proposed CC applying RSS designs compared to the Bayesian-AEWMA CC utilizing SRS. The ARL plots (Figs. 1, 2 and 3) demonstrate the superiority of the proposed Bayesian CC. Additionally, to evaluate the performance of the suggested CC under various ranked-based sampling designs, a numerical example was applied to the hard bake process in semiconductor manufacturing. Moreover, the proposed Bayesian-AEWMA CC for both P and PP distributions was more   www.nature.com/scientificreports/ effective at pointing the out-of-control signals compared to the EWMA and AEWMA CCs using the Bayesian approach under SRS. The Bayesian AEWMA CC utilizing various ranked-based sampling designs proposed in this study can be extended to other memory-type CCs. Additionally, it is worth noting that this approach is not restricted to normal distributions and can be adapted to accommodate data that follows a binomial distribution or Poisson. Nevertheless, to incorporate Bayesian updating, the likelihood function would require modification. Expanding this proposed technique to non-normal distributions and other types of control charts (CCs) could provide a more comprehensive picture of the underlying data and help detect subtle variations that may not be evident using conventional statistical methods. This could, in turn, help organizations identify potential quality concerns earlier, take corrective actions more quickly, and reduce the likelihood of costly errors and defects. For instance, in the healthcare sector, extending this approach to CCs could help detect anomalies in patient data, enabling healthcare providers to intervene promptly and provide timely care to patients. Similarly, this technique could be used in the finance industry to identify fraudulent activities and potential errors in financial transactions. In manufacturing, expanding this approach to non-normal distributions and other types of CCs could help detect variations in the production process, enabling manufacturers to improve their product quality and reduce waste.

Data availability
The datasets used and/or analyzed during the current study are available from the corresponding author upon reasonable request. Further, no experiments on humans and/or the use of human tissue samples involved in this study.